Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove IterDataPipe from Inference pipeline #96

Merged
merged 11 commits into from
Oct 3, 2024

Conversation

gitttt-1234
Copy link
Contributor

@gitttt-1234 gitttt-1234 commented Sep 27, 2024

This PR removes the Iterdatapipe dependency from the inference pipeline. Currently, we use IterDatapipe modules for creating the data pipeline from labels while running inference on .slp files. Since we're moving to Litdata (Related issue: #80), we're removing the IterDatapipe modules and implementing a thread-based approach, similar to our current VideoReader (Related issue: #26).

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced LabelsReaderDP for enhanced data handling in pipelines.
    • Added support for loading pre-trained model weights in ModelTrainer classes.
    • New attributes in CentroidCrop class to utilize ground-truth centroids during inference.
  • Bug Fixes

    • Adjusted expected output shapes in various tests to align with updated processing logic.
  • Documentation

    • Updated docstrings and comments to reflect changes in data provider and processing methods.
  • Tests

    • Refactored tests to use LabelsReaderDP and updated assertions based on new data handling logic.

@gitttt-1234 gitttt-1234 changed the base branch from main to divya/load-backbone-weights September 27, 2024 16:52
Copy link
Contributor

coderabbitai bot commented Sep 27, 2024

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces significant changes across multiple files, primarily focusing on updating the data provider from LabelsReader to LabelsReaderDP. This change affects various classes and methods, including CentroidConfmapsPipeline, BottomUpPipeline, and several test files. Additionally, new classes like VideoReader and modifications to the Predictor class enhance the functionality of the inference pipeline. The updates also include adjustments to expected output shapes in tests and the addition of parameters for loading pre-trained weights in the model trainer classes.

Changes

File Change Summary
sleap_nn/data/pipelines.py Updated data_provider type to LabelsReaderDP in make_training_pipeline. Added centroids_confidence_maps and part_affinity_fields to keep_keys.
sleap_nn/data/providers.py Renamed LabelsReader to LabelsReaderDP. Introduced VideoReader class. Modified from_filename method and __iter__ method in LabelsReaderDP.
sleap_nn/data/resizing.py Updated provider type in SizeMatcher from LabelsReader to LabelsReaderDP.
sleap_nn/inference/predictors.py Renamed video_preprocess_config to preprocess_config. Added instances_key. Updated make_pipeline method signatures and restructured _predict_generator.
sleap_nn/inference/topdown.py Added use_gt_centroids and anchor_ind attributes to CentroidCrop. Modified constructor and forward method to handle ground-truth centroids.
sleap_nn/training/model_trainer.py Added trained_ckpts_path parameter to constructors and methods in ModelTrainer and subclasses for loading pre-trained weights.
tests/assets/minimal_instance/training_config.yaml Updated prv_runid field in wandb configuration from unspecified to null.
tests/data/test_augmentation.py Changed import from LabelsReader to LabelsReaderDP.
tests/data/test_confmaps.py Updated usage of LabelsReader to LabelsReaderDP in test functions.
tests/data/test_edge_maps.py Replaced LabelsReader with LabelsReaderDP in import and test function.
tests/data/test_instance_centroids.py Updated import and instantiation of LabelsReader to LabelsReaderDP.
tests/data/test_instance_cropping.py Changed LabelsReader to LabelsReaderDP in test function.
tests/data/test_normalization.py Updated import and instantiation of LabelsReader to LabelsReaderDP.
tests/data/test_pipelines.py Replaced LabelsReader with LabelsReaderDP and adjusted expected output keys in tests.
tests/data/test_providers.py Updated LabelsReader to LabelsReaderDP in tests. Added test_labelsreader_provider for new class.
tests/data/test_resizing.py Changed import from LabelsReader to LabelsReaderDP.
tests/fixtures/datasets.py Updated provider key from "LabelsReader" to "LabelsReaderDP" and modified model configuration parameters.
tests/inference/test_bottomup.py Updated test function to load labels using sio.load_slp and simplified data handling.
tests/inference/test_predictors.py Adjusted peak_threshold parameters and expected lengths in multiple tests.
tests/inference/test_single_instance.py Simplified data handling in the test by removing the previous pipeline.
tests/inference/test_topdown.py Refactored data handling and inference processes, focusing on direct data manipulation.
tests/training/test_model_trainer.py Updated expected tensor shapes and added verification for loading trained weights in tests.

Possibly related PRs

Suggested reviewers

  • talmo

🐇 In the code we hop and play,
With LabelsReaderDP leading the way.
Pipelines updated, tests in line,
Pre-trained weights, oh how they shine!
From CentroidCrop to Predictor bright,
Our models now leap to new heights! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gitttt-1234 gitttt-1234 marked this pull request as ready for review September 28, 2024 01:40
Copy link

codecov bot commented Sep 28, 2024

Codecov Report

Attention: Patch coverage is 97.46835% with 4 lines in your changes missing coverage. Please review.

Project coverage is 97.44%. Comparing base (f093ce2) to head (b89af63).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
sleap_nn/data/providers.py 95.55% 2 Missing ⚠️
sleap_nn/inference/predictors.py 97.56% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #96      +/-   ##
==========================================
+ Coverage   96.64%   97.44%   +0.79%     
==========================================
  Files          23       37      +14     
  Lines        1818     3559    +1741     
==========================================
+ Hits         1757     3468    +1711     
- Misses         61       91      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sleap_nn/data/providers.py Outdated Show resolved Hide resolved
sleap_nn/inference/predictors.py Show resolved Hide resolved
sleap_nn/inference/predictors.py Outdated Show resolved Hide resolved
sleap_nn/inference/predictors.py Outdated Show resolved Hide resolved
sleap_nn/inference/predictors.py Outdated Show resolved Hide resolved
sleap_nn/inference/predictors.py Outdated Show resolved Hide resolved
sleap_nn/inference/predictors.py Outdated Show resolved Hide resolved
sleap_nn/inference/topdown.py Outdated Show resolved Hide resolved
sleap_nn/inference/topdown.py Outdated Show resolved Hide resolved
sleap_nn/inference/topdown.py Show resolved Hide resolved
@gitttt-1234 gitttt-1234 requested a review from talmo October 2, 2024 02:09
Copy link
Contributor

@talmo talmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're still using LabelsReaderDP in a bunch of places in the test, but I think we want to test the new LabelsReader, no?

sleap_nn/data/resizing.py Show resolved Hide resolved
sleap_nn/data/resizing.py Show resolved Hide resolved
@gitttt-1234 gitttt-1234 changed the base branch from divya/load-backbone-weights to main October 3, 2024 21:25
@gitttt-1234 gitttt-1234 changed the base branch from main to divya/load-backbone-weights October 3, 2024 21:26
@gitttt-1234 gitttt-1234 changed the base branch from divya/load-backbone-weights to main October 3, 2024 21:28
@gitttt-1234 gitttt-1234 force-pushed the divya/remove-torchdata-inference branch from cf83eee to 0da9ad2 Compare October 3, 2024 21:48
@gitttt-1234 gitttt-1234 changed the base branch from main to divya/load-backbone-weights October 3, 2024 21:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post

Actionable comments posted: 27

🧹 Outside diff range and nitpick comments (26)
tests/inference/test_bottomup.py (3)

3-7: Consider removing unused imports

The imports for Path and shutil appear to be unused in the visible code. If they are not used elsewhere in this file, consider removing them to keep the imports clean and relevant.

If these imports are indeed unnecessary, you can apply the following diff:

-from pathlib import Path
-import shutil
import sleap_io as sio
from sleap_nn.data.providers import process_lf
from sleap_nn.data.normalization import apply_normalization
🧰 Tools
🪛 Ruff

3-3: pathlib.Path imported but unused

Remove unused import: pathlib.Path

(F401)


4-4: shutil imported but unused

Remove unused import: shutil

(F401)


36-38: Approved: Direct data loading and processing

The new approach of loading labels with sleap_io and processing them directly aligns well with the PR objective of removing IterDataPipe. This change simplifies the test setup and makes it more straightforward.

For improved clarity, consider adding a brief comment explaining the purpose of the unsqueeze operation:

 labels = sio.load_slp(minimal_instance)
 ex = process_lf(labels[0], 0, 2)
-ex["image"] = apply_normalization(ex["image"]).unsqueeze(dim=0)
+# Add batch dimension to match model input requirements
+ex["image"] = apply_normalization(ex["image"]).unsqueeze(dim=0)

Line range hint 67-102: Approved: Comprehensive inference testing

The updated test performs inference directly on the processed example and includes comprehensive assertions to verify the output structure and dimensions. This aligns well with the PR objective and ensures the inference model is working as expected.

To improve readability, consider grouping related assertions and adding comments to explain their purpose:

 output = inference_layer(ex)[0]
-assert "confmaps" not in output.keys()
-assert output["pred_instance_peaks"].is_nested
-assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2)
-assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)
+# Verify that confmaps are not returned when not requested
+assert "confmaps" not in output.keys()
+
+# Check structure and dimensions of predicted peaks
+assert output["pred_instance_peaks"].is_nested
+assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2)
+assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)

 # ... (code for second inference) ...

 output = inference_layer(ex)[0]
-assert tuple(output["confmaps"].shape) == (1, 2, 192, 192)
-assert tuple(output["part_affinity_fields"].shape) == (1, 96, 96, 2)
-assert output["pred_instance_peaks"].is_nested
-assert output["peaks"][0].shape[-1] == 2
-assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2)
-assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)
+# Verify dimensions of returned confmaps and part affinity fields
+assert tuple(output["confmaps"].shape) == (1, 2, 192, 192)
+assert tuple(output["part_affinity_fields"].shape) == (1, 96, 96, 2)
+
+# Check structure and dimensions of predicted peaks
+assert output["pred_instance_peaks"].is_nested
+assert output["peaks"][0].shape[-1] == 2
+assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2)
+assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)
tests/data/test_resizing.py (2)

19-19: Improve variable naming for better readability.

While the change to use LabelsReaderDP is correct, the variable name l is ambiguous and could be improved for better code readability.

Consider renaming the variable to something more descriptive, like labels_reader:

-    l = LabelsReaderDP.from_filename(minimal_instance)
-    pipe = SizeMatcher(l, provider=l)
+    labels_reader = LabelsReaderDP.from_filename(minimal_instance)
+    pipe = SizeMatcher(labels_reader, provider=labels_reader)
🧰 Tools
🪛 Ruff

19-19: Ambiguous variable name: l

(E741)


52-52: Consistently improve variable naming throughout the file.

The ambiguous variable name l is used again. For consistency and improved readability, consider updating this and all similar occurrences in the file.

Apply this change here and in other similar instances:

-    l = LabelsReaderDP.from_filename(minimal_instance)
+    labels_reader = LabelsReaderDP.from_filename(minimal_instance)
🧰 Tools
🪛 Ruff

52-52: Ambiguous variable name: l

(E741)

tests/data/test_confmaps.py (1)

110-110: LGTM: Consistent update to data pipeline initialization.

The change from LabelsReader to LabelsReaderDP is consistent with the previous modifications and aligns with the PR objectives.

The static analysis tool flagged an unused import of numpy. Consider removing this import if it's not used elsewhere in the file:

-import numpy as np
tests/data/test_edge_maps.py (2)

Line range hint 193-199: LGTM! Consider adding tests for thread-based functionality.

The change from LabelsReader to LabelsReaderDP is consistent with the import statement modification and aligns with the PR objective of removing IterDataPipe.

While the current test covers the basic functionality, consider adding specific tests for the thread-based approach mentioned in the PR objectives. This would ensure that the new implementation behaves correctly in a multi-threaded environment.

Example test cases could include:

  1. Concurrent access to LabelsReaderDP from multiple threads
  2. Performance comparison between the old and new implementations
  3. Thread safety of the LabelsReaderDP methods

Would you like assistance in drafting these additional test cases?


Line range hint 1-199: Summary: Changes align with PR objectives, but thread-based approach not visible.

The modifications in this file successfully replace LabelsReader with LabelsReaderDP, which aligns with the PR objective of removing IterDataPipe. However, the thread-based approach mentioned in the PR objectives is not visible in these changes.

Consider the following suggestions for further development:

  1. Implement and test the thread-based approach mentioned in the PR objectives.
  2. Update the docstring of the test_part_affinity_fields_generator function to reflect the use of LabelsReaderDP.
  3. If LabelsReaderDP has any new features or behaviors different from LabelsReader, consider adding new test cases to cover these aspects.

These additions would ensure that the new implementation is thoroughly tested and documented, improving the overall quality and maintainability of the codebase.

sleap_nn/data/pipelines.py (2)

255-255: LGTM with a minor suggestion

The changes look good and align with the PR objective of removing the IterDataPipe dependency. The update to LabelsReaderDP in the docstring and the addition of centroids_confidence_maps to the keep_keys list are appropriate.

Consider adding a brief comment explaining the purpose of the centroids_confidence_maps key, especially if it's a new addition to the pipeline output. This would improve code readability and maintainability.

Also applies to: 279-279


356-356: LGTM with a suggestion for consistency

The changes are appropriate and align with the PR objective. The update to LabelsReaderDP in the docstring and the addition of part_affinity_fields to the keep_keys list are correct.

For consistency with the CentroidConfmapsPipeline class, consider adding a brief comment explaining the purpose of the part_affinity_fields key in the keep_keys list. This would improve code readability and maintain a consistent documentation style across pipeline classes.

Also applies to: 366-366

tests/inference/test_predictors.py (3)

87-93: Approve the introduction of max_instances variable

The introduction of the max_instances variable and its usage in the function call improves code readability and maintainability. This change allows for easier modification of the maximum instances value in the future.

Consider moving the max_instances variable to the top of the function or even making it a constant at the module level if it's used in multiple test functions. This would further improve the code's organization and make it easier to adjust this value across multiple tests if needed.


97-99: Approve the updated assertion using max_instances

The assertion has been correctly updated to use the max_instances variable, which aligns with the earlier changes. The use of <= in the assertion allows for flexibility in the number of instances, up to the maximum.

To improve clarity, consider adding a comment explaining why the number of centroids might be less than or equal to max_instances. This would help future readers understand the expected behavior more easily.


Line range hint 1-450: Summary of changes and recommendations

The changes in this file primarily involve adjustments to test parameters and expected outcomes for various predictor classes. Here are the key points that need attention:

  1. The lowering of peak_threshold values in multiple tests may impact the number of detected instances. Verify that these changes align with the expected behavior of the predictors.

  2. The significant increase in the expected number of labels (from 100 to 1100) in the test_single_instance_predictor function needs clarification, especially considering the addition of the videoreader_end_idx parameter limiting frames to 100.

  3. The introduction of the max_instances variable improves code readability, but consider moving it to a more prominent location for better visibility.

  4. Some assertions have been updated to reflect new output structures or use new variables. Ensure these changes accurately represent the expected behavior of the predictors.

Overall, while the changes seem to be part of updating the tests to match new predictor behaviors, it's crucial to verify that all modifications align with the intended functionality and performance of the predictors. Consider adding comments to explain significant changes in expected outcomes or test parameters to improve code maintainability and clarity for future readers.

tests/training/test_model_trainer.py (2)

73-73: Document new parameter and verify test setup

A new parameter minimal_instance_bottomup_ckpt: str has been added to the test_trainer function. This parameter is likely used for weight loading verification.

Please consider the following improvements:

  1. Add a docstring to explain the purpose and expected format of the minimal_instance_bottomup_ckpt parameter.
  2. Ensure that the test setup (e.g., in conftest.py or similar) provides this checkpoint file correctly.

Example docstring:

def test_trainer(config, tmp_path: str, minimal_instance_bottomup_ckpt: str):
    """
    Test the ModelTrainer class.

    Args:
        config: The configuration object for the test.
        tmp_path (str): Temporary path for saving test outputs.
        minimal_instance_bottomup_ckpt (str): Path to a minimal instance bottom-up model checkpoint for weight verification.
    """
    # ... (rest of the function)

280-297: Approve weight loading verification with a suggestion

The addition of weight loading verification is a good practice to ensure model consistency between saved checkpoints and loaded models.

Consider increasing the precision of the weight comparison:

-    assert np.all(np.abs(first_layer_ckpt - model_ckpt) < 1e-3)
+    assert np.allclose(first_layer_ckpt, model_ckpt, atol=1e-5, rtol=1e-5)

This change uses np.allclose() with stricter tolerance levels, which can catch smaller discrepancies in weight loading.

tests/inference/test_topdown.py (3)

85-85: Simplify dictionary key check

Instead of using key not in dict.keys(), use key not in dict for a more idiomatic and efficient check.

Apply this diff to simplify the key check:

-assert "instance_image" not in out.keys()
+assert "instance_image" not in out
🧰 Tools
🪛 Ruff

85-85: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)


113-120: Review commented-out code in data processing

The line # ex["centroids"] = generate_centroids(ex["instances"], 0) is commented out. If this code is no longer necessary, consider removing it. If it is needed, uncomment and ensure it functions correctly.


123-126: Improve parameter alignment in CentroidCrop

For better readability, align the parameters in the CentroidCrop instantiation vertically.

Apply this diff to adjust the alignment:

     topdown_inf_layer = TopDownInferenceModel(
-        centroid_crop=CentroidCrop(
-            use_gt_centroids=True, anchor_ind=0, crop_hw=(160, 160)
-        ),
+        centroid_crop=CentroidCrop(
+            use_gt_centroids=True,
+            anchor_ind=0,
+            crop_hw=(160, 160)
+        ),
         instance_peaks=FindInstancePeaksGroundTruth(),
     )
sleap_nn/data/providers.py (1)

Line range hint 155-186: Refactor to eliminate code duplication in data processing.

The logic for processing LabeledFrame objects into sample dictionaries is duplicated in both LabelsReaderDP.__iter__ and LabelsReader.run. Consider refactoring by utilizing the existing process_lf function to reduce code redundancy and improve maintainability.

Also applies to: 339-380

sleap_nn/training/model_trainer.py (3)

584-584: Simplify dictionary iteration by removing .keys()

When iterating over dictionary keys, it's more efficient to use for k in dict instead of for k in dict.keys().

Apply this diff to simplify:

        ckpt["state_dict"] = {
            k: ckpt["state_dict"][k]
-           for k in ckpt["state_dict"].keys()
+           for k in ckpt["state_dict"]
            if ".head" not in k
        }
🧰 Tools
🪛 Ruff

584-584: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


578-578: Address the TODO: Handle different input channels

There's a TODO comment indicating the need to handle different input channels. This is important for ensuring compatibility when loading pre-trained weights with models expecting different input dimensions.

Would you like assistance in implementing support for different input channels? I can help generate a solution or open a GitHub issue to track this task.


510-510: Use Optional[str] for trained_ckpts_path type annotation

The trained_ckpts_path parameter is currently annotated as str = None, which may cause confusion since None is not an instance of str. To improve type correctness, consider using Optional[str] = None for parameters that can be None.

Apply this diff to update the type annotations:

     def __init__(
         self,
         config: OmegaConf,
         skeletons: Optional[List[sio.Skeleton]],
         model_type: str,
-        trained_ckpts_path: str = None,
+        trained_ckpts_path: Optional[str] = None,
     ):

Apply similar changes in all occurrences where trained_ckpts_path is defined.

Also applies to: 692-692, 765-765, 838-838, 911-911

sleap_nn/inference/predictors.py (3)

13-13: Remove unused import 'litdata'

The import litdata is not used in the code and can be safely removed to clean up unnecessary dependencies.

🧰 Tools
🪛 Ruff

13-13: litdata imported but unused

Remove unused import: litdata

(F401)


19-19: Remove unused import 'apply_sizematcher'

The function apply_sizematcher is imported but not used. Consider removing it to maintain code cleanliness.

🧰 Tools
🪛 Ruff

19-19: sleap_nn.data.resizing.apply_sizematcher imported but unused

Remove unused import: sleap_nn.data.resizing.apply_sizematcher

(F401)


27-28: Remove unused imports 'generate_centroids' and 'generate_crops'

The functions generate_centroids and generate_crops are imported but not utilized in the code. Removing unused imports can help in reducing clutter.

🧰 Tools
🪛 Ruff

27-27: sleap_nn.data.instance_centroids.generate_centroids imported but unused

Remove unused import: sleap_nn.data.instance_centroids.generate_centroids

(F401)


28-28: sleap_nn.data.instance_cropping.generate_crops imported but unused

Remove unused import: sleap_nn.data.instance_cropping.generate_crops

(F401)

sleap_nn/inference/topdown.py (1)

531-533: Move eval() call outside the loop

You're calling self.instance_peaks.eval() inside the loop. To avoid redundant calls, move it outside the loop.

Apply this diff to refactor the code:

+        self.instance_peaks.eval()
         for i in batch:
-            self.instance_peaks.eval()
             peaks_output.append(self.instance_peaks(i))
🛑 Comments failed to post (27)
tests/data/test_normalization.py (1)

9-9: 💡 Codebase verification

Incomplete removal of LabelsReader imports and usages.

Several files still import and use LabelsReader, which prevents the complete removal of the IterDataPipe dependency as intended by the PR. Please update the following files to replace LabelsReader with LabelsReaderDP:

  • sleap_nn/inference/predictors.py
  • sleap_nn/data/resizing.py
  • tests/data/test_providers.py
  • tests/data/test_normalization.py
  • tests/data/test_instance_cropping.py
  • tests/data/test_instance_centroids.py
  • tests/data/test_edge_maps.py
  • tests/data/test_confmaps.py
  • tests/data/test_augmentation.py
  • tests/data/test_pipelines.py
🔗 Analysis chain

Approve import change and verify consistency.

The change from LabelsReader to LabelsReaderDP aligns with the PR objective to remove IterDataPipe dependency. This likely implements the new thread-based approach.

Let's verify if this change is consistent across the codebase:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining imports of LabelsReader and usages of LabelsReaderDP

# Search for any remaining imports of LabelsReader
echo "Checking for remaining LabelsReader imports:"
rg --type python "from sleap_nn.data.providers import LabelsReader"

# Search for new imports of LabelsReaderDP
echo "Checking for new LabelsReaderDP imports:"
rg --type python "from sleap_nn.data.providers import LabelsReaderDP"

# Search for any remaining usage of LabelsReader
echo "Checking for remaining LabelsReader usage:"
rg --type python "LabelsReader\."

# Search for new usage of LabelsReaderDP
echo "Checking for new LabelsReaderDP usage:"
rg --type python "LabelsReaderDP\."

Length of output: 470


Script:

#!/bin/bash
# Description: Check for any remaining imports of LabelsReader and usages of LabelsReaderDP

# Search for any remaining imports of LabelsReader
echo "Checking for remaining LabelsReader imports:"
rg -g "*.py" "from sleap_nn\.data\.providers import LabelsReader"

# Search for new imports of LabelsReaderDP
echo "Checking for new LabelsReaderDP imports:"
rg -g "*.py" "from sleap_nn\.data\.providers import LabelsReaderDP"

# Search for any remaining usage of LabelsReader
echo "Checking for remaining LabelsReader usage:"
rg -g "*.py" "LabelsReader\."

# Search for new usage of LabelsReaderDP
echo "Checking for new LabelsReaderDP usage:"
rg -g "*.py" "LabelsReaderDP\."

Length of output: 4193

tests/data/test_instance_cropping.py (1)

13-13: 💡 Codebase verification

Remove remaining imports of LabelsReader.

The shell scripts indicate that LabelsReader is still imported in the following file:

  • sleap_nn/inference/predictors.py:from sleap_nn.data.providers import LabelsReader, VideoReader

Please update this import to LabelsReaderDP to align with the PR objectives.

🔗 Analysis chain

Approve import change and verify its impact.

The import change from LabelsReader to LabelsReaderDP aligns with the PR objective of removing IterDataPipe. This change appears to be part of the transition to a new system.

To ensure this change is consistent across the codebase, please run the following script:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of LabelsReaderDP and absence of LabelsReader

# Test 1: Check for any remaining imports of LabelsReader
echo "Checking for remaining LabelsReader imports:"
rg --type python 'from sleap_nn\.data\.providers import LabelsReader'

# Test 2: Verify the correct import and usage of LabelsReaderDP
echo "Verifying LabelsReaderDP usage:"
rg --type python 'from sleap_nn\.data\.providers import LabelsReaderDP'
rg --type python 'LabelsReaderDP\.from_filename'

Length of output: 461


Script:

#!/bin/bash
# Description: Verify the usage of LabelsReaderDP and absence of LabelsReader

# Test 1: Check for any remaining imports of LabelsReader
echo "Checking for remaining LabelsReader imports:"
rg --type py 'from sleap_nn\.data\.providers import LabelsReader'

# Test 2: Verify the correct import and usage of LabelsReaderDP
echo "Verifying LabelsReaderDP usage:"
rg --type py 'from sleap_nn\.data\.providers import LabelsReaderDP'
rg --type py 'LabelsReaderDP\.from_filename'

Length of output: 3724

tests/data/test_augmentation.py (1)

13-13: 💡 Codebase verification

Remaining LabelsReader Imports Found

The following files still import LabelsReader and should be updated to use LabelsReaderDP:

  • sleap_nn/inference/predictors.py

Please update these imports to ensure consistency and complete the removal of the IterDataPipe dependency.

🔗 Analysis chain

LGTM! Verify other imports across the codebase.

The import change from LabelsReader to LabelsReaderDP aligns with the PR objective of removing the IterDataPipe dependency. This change is correct and follows Python conventions.

To ensure consistency across the codebase, please run the following script to check for any remaining LabelsReader imports:

If the script returns any results, those files may need to be updated to use LabelsReaderDP instead.


Verification script encountered errors. Please run the following corrected script to check for any remaining LabelsReader imports:


Verification script encountered errors. Please run the following corrected script to check for any remaining LabelsReader imports:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining LabelsReader imports

# Test: Search for LabelsReader imports
rg --type python 'from sleap_nn.data.providers import LabelsReader'

Length of output: 321


Script:

#!/bin/bash
# Description: Check for any remaining LabelsReader imports

# Test: Search for LabelsReader imports
rg --type python 'from sleap_nn.data.providers import LabelsReader' || echo "No remaining LabelsReader imports found."

Length of output: 191


Script:

#!/bin/bash
# Description: Check for any remaining LabelsReader imports

# Test: Search for LabelsReader imports using the correct file type
rg --type py 'from sleap_nn.data.providers import LabelsReader' || echo "No remaining LabelsReader imports found."

Length of output: 972

tests/data/test_resizing.py (1)

68-68: 🛠️ Refactor suggestion

Refactor ambiguous variable names file-wide.

The ambiguous variable name l is used consistently throughout the file. To improve code readability and maintainability, it's recommended to refactor all instances of this variable name.

Would you like me to create a script to automate this change across the file? The script would replace all instances of l = LabelsReaderDP.from_filename(...) with labels_reader = LabelsReaderDP.from_filename(...).

🧰 Tools
🪛 Ruff

68-68: Ambiguous variable name: l

(E741)

tests/data/test_confmaps.py (1)

22-22: 💡 Codebase verification

Issue: Remaining instances of LabelsReader detected.

Several instances of LabelsReader remain in the codebase. Please update them to LabelsReaderDP to ensure consistency and maintain compatibility.

  • tests/inference/test_predictors.py
  • tests/tracking/test_tracker.py
  • tests/tracking/candidates/test_local_queues.py
  • tests/tracking/candidates/test_fixed_window.py
  • tests/data/test_providers.py
  • sleap_nn/data/providers.py
  • sleap_nn/inference/predictors.py
  • sleap_nn/data/pipelines.py
🔗 Analysis chain

LGTM: Data pipeline initialization updated.

The change from LabelsReader to LabelsReaderDP is consistent with the PR objectives. The function signature remains the same, which is good for maintaining compatibility.

To ensure the new implementation is fully compatible, please run the following verification script:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP

# Test: Search for any remaining instances of LabelsReader
rg --type python 'LabelsReader(?!DP)' -g '!tests/data/test_confmaps.py'

# Note: The search excludes the current file to avoid false positives from the changes we're reviewing.
# If this search returns results, it may indicate places where LabelsReader still needs to be updated.

Length of output: 207


Script:

#!/bin/bash
# Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP

# Test: Search for any remaining instances of LabelsReader not followed by DP
rg --type python 'LabelsReader(?!DP)' --ignore-file '!tests/data/test_confmaps.py'

# Note: The search excludes the current file to avoid false positives from the changes we're reviewing.
# If this search returns results, it may indicate places where LabelsReader still needs to be updated.

Length of output: 318


Script:

#!/bin/bash
# Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP

# Test: Search for any remaining instances of LabelsReader not followed by DP using PCRE2 for look-around support
rg --pcre2 --type python 'LabelsReader(?!DP)' --ignore-file '!tests/data/test_confmaps.py'

# Note: The search excludes the current file to avoid false positives from the changes we're reviewing.
# If this search returns results, it may indicate places where LabelsReader still needs to be updated.

Length of output: 192


Script:

#!/bin/bash
# Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP

# Test: Search for any remaining instances of LabelsReader not followed by DP using PCRE2 for look-around support
rg --pcre2 'LabelsReader(?!DP)' --type py -g '!tests/data/test_confmaps.py'

# Note: The search excludes the current file to avoid false positives from the changes we're reviewing.
# If this search returns results, it may indicate places where LabelsReader still needs to be updated.

Length of output: 4395

sleap_nn/data/resizing.py (1)

7-7: 💡 Codebase verification

IterDataPipe is still in use and should not be removed

The IterDataPipe is still imported and used in the following classes within sleap_nn/data/resizing.py:

  • Resizer(IterDataPipe)
  • PadToStride(IterDataPipe)
  • SizeMatcher(IterDataPipe)

If the intention is to remove IterDataPipe, please refactor these classes accordingly. Otherwise, consider retaining the import to maintain existing functionality.

🔗 Analysis chain

Approve change, but question IterDataPipe usage

The change from LabelsReader to LabelsReaderDP aligns with the PR objective of transitioning away from IterDataPipe. However, I noticed that IterDataPipe is still imported on line 9.

Could you clarify if IterDataPipe is still needed in this file? If not, consider removing the import:

-from torch.utils.data.datapipes.datapipe import IterDataPipe

To verify the usage of IterDataPipe, let's run the following script:

This will help us determine if there are any classes still inheriting from IterDataPipe in this file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for IterDataPipe usage in the file

# Test: Search for IterDataPipe usage
rg --type python 'class.*\(IterDataPipe\)' sleap_nn/data/resizing.py

Length of output: 187


Script:

#!/bin/bash
# Description: Check for IterDataPipe usage in sleap_nn/data/resizing.py

ast-grep --lang python --pattern 'class $_(IterDataPipe) { $$$ }' sleap_nn/data/resizing.py

Length of output: 546


Script:

#!/bin/bash
# Description: Check for IterDataPipe usage in sleap_nn/data/resizing.py

# Search for 'IterDataPipe' in the specified file
grep -E 'IterDataPipe' sleap_nn/data/resizing.py

Length of output: 997

sleap_nn/data/pipelines.py (1)

255-255: 💡 Codebase verification

Incomplete Removal of LabelsReader: Multiple Instances Still Present

The verification revealed that LabelsReader is still being used in several parts of the codebase:

  • Test Files:

    • tests/inference/test_predictors.py
    • tests/tracking/candidates/test_fixed_window.py
    • tests/tracking/test_tracker.py
    • tests/tracking/candidates/test_local_queues.py
    • tests/data/test_providers.py
  • Implementation Files:

    • sleap_nn/data/providers.py
    • sleap_nn/inference/predictors.py
  • Documentation:

    • sleap_nn/data/pipelines.py (in docstrings)

To fully align with the PR objective of removing the IterDataPipe dependency, please update all instances of LabelsReader to LabelsReaderDP.

🔗 Analysis chain

Summary: Changes align with PR objective and maintain consistency

The modifications to CentroidConfmapsPipeline and BottomUpPipeline classes are consistent and appropriate:

  1. Both classes now use LabelsReaderDP instead of LabelsReader in their docstrings, aligning with the PR objective of removing IterDataPipe dependency.
  2. New keys (centroids_confidence_maps and part_affinity_fields) are added to the respective keep_keys lists, ensuring that the pipeline outputs include the necessary data.

These changes appear to successfully update the data pipeline to work with the new data provider while maintaining the required functionality.

To ensure that these changes are consistent throughout the codebase, please run the following script:

This script will help identify any inconsistencies in the usage of LabelsReaderDP and the new keys across the codebase.

Also applies to: 279-279, 356-356, 366-366


To address the issues found during script execution, please run the updated verification script below:

This updated script includes the --pcre2 flag to support look-around in the first rg command and corrects the file type to py in both commands. Running this should provide the necessary information to verify the changes effectively.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of LabelsReaderDP and the presence of new keys in other pipeline classes.

# Check for any remaining instances of LabelsReader
echo "Checking for remaining instances of LabelsReader:"
rg "LabelsReader(?!DP)" --type python

# Check for consistent usage of new keys in keep_keys lists
echo "Checking for consistent usage of new keys in keep_keys lists:"
rg "keep_keys\s*=\s*\[" --type python -A 10

Length of output: 592


Script:

# 
#!/bin/bash
# Description: Verify the usage of LabelsReaderDP and the presence of new keys in other pipeline classes.

# Check for any remaining instances of LabelsReader using PCRE2 for look-around
echo "Checking for remaining instances of LabelsReader:"
rg --pcre2 "LabelsReader(?!DP)" --type py

# Check for consistent usage of new keys in keep_keys lists
echo "Checking for consistent usage of new keys in keep_keys lists:"
rg "keep_keys\s*=\s*\[" --type py -A 10

Length of output: 7075

tests/data/test_pipelines.py (1)

566-566: 💡 Codebase verification

Issues Found: Incomplete Replacement of LabelsReader with LabelsReaderDP.

The following references to LabelsReader were found and need to be updated:

  • tests/tracking/test_tracker.py: provider="LabelsReader"
  • tests/tracking/candidates/test_fixed_window.py: provider="LabelsReader"
  • tests/tracking/candidates/test_local_queues.py: provider="LabelsReader"
  • tests/inference/test_predictors.py: Multiple instances of provider="LabelsReader" and related comments
  • tests/assets/minimal_instance_bottomup/training_config.yaml: provider: LabelsReader
  • tests/assets/minimal_instance_centroid/training_config.yaml: provider: LabelsReader
  • tests/data/test_providers.py: Multiple instances including class instantiations and docstrings
  • tests/assets/minimal_instance/initial_config.yaml: provider: LabelsReader
  • tests/assets/minimal_instance/training_config.yaml: provider: LabelsReader
  • tests/assets/minimal_instance_centroid/initial_config.yaml: provider: LabelsReader
  • tests/assets/minimal_instance_bottomup/initial_config.yaml: provider: LabelsReader

Please ensure all instances of LabelsReader are replaced with LabelsReaderDP to maintain consistency and functionality.

🔗 Analysis chain

LGTM: LabelsReader consistently replaced with LabelsReaderDP.

These changes are consistent with the previous modifications to use LabelsReaderDP throughout the file. All constructor calls remain unchanged, indicating straightforward replacements.

To ensure the LabelsReaderDP implementation is complete and consistent, run the following script:

Also applies to: 610-610, 688-688, 768-768

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the LabelsReaderDP implementation

# Check for any remaining references to LabelsReader
rg "LabelsReader(?!DP)" tests/

# Compare the method signatures of LabelsReader and LabelsReaderDP
rg -U --multiline "class LabelsReader.*?:.*?\n.*?def" tests/
rg -U --multiline "class LabelsReaderDP.*?:.*?\n.*?def" tests/

Length of output: 489


Script:

#!/bin/bash
# Description: Verify the LabelsReaderDP implementation with corrected commands

# Check for any remaining references to LabelsReader not followed by DP using PCRE2
rg --pcre2 "LabelsReader(?!DP)" tests/

# Compare the method signatures of LabelsReader and LabelsReaderDP using ast-grep
ast-grep --lang python --pattern $'class LabelsReaderDP:
    $$$
    def $$$' tests/

ast-grep --lang python --pattern $'class LabelsReader:
    $$$
    def $$$' tests/

Length of output: 2529

tests/inference/test_single_instance.py (2)

84-84: ⚠️ Potential issue

Simplify membership test in assertion

Instead of checking 'key in dict.keys()', you can directly check 'key in dict' for a more concise and Pythonic approach.

Apply this diff to fix the issue:

-        assert "pred_confmaps" in outputs[0][0].keys()
+        assert "pred_confmaps" in outputs[0][0]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    assert "pred_confmaps" in outputs[0][0]
🧰 Tools
🪛 Ruff

84-84: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


71-72: ⚠️ Potential issue

Avoid in-place modification of ex["image"]

Modifying ex["image"] in place may affect other tests or assertions that rely on the original image. Consider creating a copy of the example or the image before resizing to prevent unintended side effects.

Apply this diff to fix the issue:

-        ex["image"] = resize_image(ex["image"], 0.5)
+        ex_resized = ex.copy()
+        ex_resized["image"] = resize_image(ex_resized["image"], 0.5)
+        outputs.append(find_peaks_layer(ex_resized))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    ex_resized = ex.copy()
    ex_resized["image"] = resize_image(ex_resized["image"], 0.5)
    outputs.append(find_peaks_layer(ex_resized))
tests/data/test_providers.py (5)

110-110: ⚠️ Potential issue

Unused loop variable i should be replaced with _

The loop variable i is not used within the loop body. To indicate that it's intentionally unused, consider renaming it to _.

Apply this diff:

         data = []
-        for i in range(batch_size):
+        for _ in range(batch_size):
             frame = reader.frame_buffer.get()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        data = []
        for _ in range(batch_size):
            frame = reader.frame_buffer.get()
🧰 Tools
🪛 Ruff

110-110: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


142-142: 🛠️ Refactor suggestion

Avoid using bare except: statements

As above, consider specifying the exception type to avoid catching unintended exceptions.

Apply this diff:

     except:
+    except Exception:
         raise

Committable suggestion was skipped due to low confidence.


133-133: ⚠️ Potential issue

Unused loop variable i should be replaced with _

Similarly, in this loop, the variable i is not used. Renaming it to _ improves code readability.

Apply this diff:

         data = []
-        for i in range(batch_size):
+        for _ in range(batch_size):
             frame = reader.frame_buffer.get()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        data = []
        for _ in range(batch_size):
             frame = reader.frame_buffer.get()
🧰 Tools
🪛 Ruff

133-133: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


119-119: 🛠️ Refactor suggestion

Avoid using bare except: statements

Using a bare except: can catch unexpected exceptions, including system-exiting exceptions. It's better to catch specific exceptions or use except Exception: to handle standard errors.

Apply this diff:

     except:
+    except Exception:
         raise

Committable suggestion was skipped due to low confidence.


18-18: ⚠️ Potential issue

Rename variable l to a more descriptive name

The variable l in line 18 is a single-character variable name, which can be ambiguous and reduce code readability. It's recommended to use more descriptive variable names for better clarity.

Apply this diff to rename the variable:

 def test_providers(minimal_instance):
     """Test LabelsReaderDP module."""
-    l = LabelsReaderDP.from_filename(minimal_instance)
+    labels_reader = LabelsReaderDP.from_filename(minimal_instance)
     sample = next(iter(labels_reader))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    labels_reader = LabelsReaderDP.from_filename(minimal_instance)
🧰 Tools
🪛 Ruff

18-18: Ambiguous variable name: l

(E741)

tests/inference/test_topdown.py (4)

11-11: ⚠️ Potential issue

Remove unused imports Resizer and PadToStride

The imports Resizer and PadToStride from sleap_nn.data.resizing are not used in the code and should be removed to reduce clutter.

Apply this diff to remove the unused imports:

-from sleap_nn.data.resizing import SizeMatcher, Resizer, PadToStride
+from sleap_nn.data.resizing import SizeMatcher
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from sleap_nn.data.resizing import SizeMatcher
🧰 Tools
🪛 Ruff

11-11: sleap_nn.data.resizing.Resizer imported but unused

Remove unused import

(F401)


11-11: sleap_nn.data.resizing.PadToStride imported but unused

Remove unused import

(F401)


8-8: ⚠️ Potential issue

Remove unused import resize_image

The import resize_image from sleap_nn.data.resizing is not utilized in the code and should be removed to clean up the imports.

Apply this diff to remove the unused import:

-from sleap_nn.data.resizing import resize_image
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.


🧰 Tools
🪛 Ruff

8-8: sleap_nn.data.resizing.resize_image imported but unused

Remove unused import: sleap_nn.data.resizing.resize_image

(F401)


12-12: ⚠️ Potential issue

Remove unused import InstanceCropper

The import InstanceCropper from sleap_nn.data.instance_cropping is not utilized and should be removed to clean up the code.

Apply this diff to remove the unused import:

-from sleap_nn.data.instance_cropping import InstanceCropper, generate_crops
+from sleap_nn.data.instance_cropping import generate_crops
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from sleap_nn.data.instance_cropping import generate_crops
🧰 Tools
🪛 Ruff

12-12: sleap_nn.data.instance_cropping.InstanceCropper imported but unused

Remove unused import: sleap_nn.data.instance_cropping.InstanceCropper

(F401)


9-9: ⚠️ Potential issue

Remove unused import InstanceCentroidFinder

The import InstanceCentroidFinder from sleap_nn.data.instance_centroids is not used and can be removed to streamline the code.

Apply this diff to remove the unused import:

-from sleap_nn.data.instance_centroids import InstanceCentroidFinder, generate_centroids
+from sleap_nn.data.instance_centroids import generate_centroids
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from sleap_nn.data.instance_centroids import generate_centroids
🧰 Tools
🪛 Ruff

9-9: sleap_nn.data.instance_centroids.InstanceCentroidFinder imported but unused

Remove unused import: sleap_nn.data.instance_centroids.InstanceCentroidFinder

(F401)

sleap_nn/data/providers.py (3)

255-257: ⚠️ Potential issue

Correct the docstring to reflect the method functionality.

The from_filename method in VideoReader is intended to load a video file, but the docstring incorrectly mentions a .slp file, which is a labels file. Update the docstring to accurately describe the method's purpose.

Apply this diff:

     @classmethod
     def from_filename(
         cls,
         filename: str,
         queue_maxsize: int,
         start_idx: Optional[int] = None,
         end_idx: Optional[int] = None,
     ):
-        """Create VideoReader from a .slp filename."""
+        """Create VideoReader from a video filename."""
         video = sio.load_video(filename)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        """Create VideoReader from a video filename."""
        video = sio.load_video(filename)
        frame_buffer = Queue(maxsize=queue_maxsize)

383-385: 🛠️ Refactor suggestion

Use logging instead of print statements for error reporting.

Similarly, in LabelsReader.run, replace print statements with logging.error for consistent and configurable error logging.

Apply this diff:

 # At the top of the file, add:
+import logging

 # Inside the exception handling block:
 except Exception as e:
-    print(f"Error when reading labelled frame. Stopping labels reader.\n{e}")
+    logging.error(f"Error when reading labelled frame. Stopping labels reader.\n{e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

import logging

        except Exception as e:
            logging.error(f"Error when reading labelled frame. Stopping labels reader.\n{e}")

280-282: 🛠️ Refactor suggestion

Use logging instead of print statements for error reporting.

In the exception handling of VideoReader.run, replace print statements with the logging module for better logging practices and configurable error handling.

Apply this diff:

 # At the top of the file, add:
+import logging

 # Inside the exception handling block:
 except Exception as e:
-    print(f"Error when reading video frame. Stopping video reader.\n{e}")
+    logging.error(f"Error when reading video frame. Stopping video reader.\n{e}")

Committable suggestion was skipped due to low confidence.

sleap_nn/training/model_trainer.py (1)

579-588: ⚠️ Potential issue

Security Risk: Unrestricted unpickling with torch.load can lead to code execution

The use of torch.load(trained_ckpts_path) without proper safeguards can pose a security risk. Unpickling data from untrusted sources can execute arbitrary code. Ensure that trained_ckpts_path comes from a trusted source, and consider using a more secure method for loading model weights.

Additionally, it's important to add error handling to manage cases where the checkpoint file might be missing or corrupted.

Apply the following changes to enhance security and error handling:

+import pickle
+import errno
...
    if trained_ckpts_path is not None:
        print(f"Loading weights from `{trained_ckpts_path}` ...")
+       if not os.path.isfile(trained_ckpts_path):
+           raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), trained_ckpts_path)
        try:
-           ckpt = torch.load(trained_ckpts_path)
+           with open(trained_ckpts_path, 'rb') as f:
+               ckpt = torch.load(f, map_location=self.device, pickle_module=pickle)
        except Exception as e:
            raise RuntimeError(f"Error loading checkpoint: {e}")
        ckpt["state_dict"] = {
            k: ckpt["state_dict"][k]
            for k in ckpt["state_dict"]
            if ".head" not in k
        }
        self.load_state_dict(ckpt["state_dict"], strict=False)

Committable suggestion was skipped due to low confidence.

🧰 Tools
🪛 Ruff

584-584: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

sleap_nn/inference/predictors.py (3)

634-634: ⚠️ Potential issue

Use ValueError instead of Exception for invalid provider

Raising a ValueError is more appropriate when indicating an invalid argument value. It provides clearer information about the nature of the error.

Apply this diff to improve error handling:

- raise Exception(
+ raise ValueError(
    "Provider not recognised. Please use either `LabelsReader` or `VideoReader` as provider"
)

Committable suggestion was skipped due to low confidence.


1244-1244: ⚠️ Potential issue

Use ValueError instead of Exception for invalid provider

Switching to ValueError for invalid argument values aligns with Python's standard exception handling practices.

Apply this diff to improve error handling:

- raise Exception(
+ raise ValueError(
    "Provider not recognised. Please use either `LabelsReader` or `VideoReader` as provider"
)

Committable suggestion was skipped due to low confidence.


305-305: ⚠️ Potential issue

Call self.pipeline.stop() before self.pipeline.join()

To ensure proper termination of the pipeline threads, call self.pipeline.stop() before joining. This allows the pipeline to cease operations gracefully before the main thread waits for it to finish.

Apply this diff to address the issue:

  # At the end of _predict_generator method
+ self.pipeline.stop()
  self.pipeline.join()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        self.pipeline.stop()
        self.pipeline.join()
sleap_nn/inference/topdown.py (1)

518-523: ⚠️ Potential issue

Simplify nested if statements

You can simplify the nested if statements into a single condition.

Apply this diff to streamline the code:

-if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth):
-    if "instances" not in batch:
-        raise ValueError(
-            "Ground truth data was not detected... "
-            "Please load both models when predicting on non-ground-truth data."
-        )
+if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth) and "instances" not in batch:
+    raise ValueError(
+        "Ground truth data was not detected... "
+        "Please load both models when predicting on non-ground-truth data."
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth) and "instances" not in batch:
            raise ValueError(
                "Ground truth data was not detected... "
                "Please load both models when predicting on non-ground-truth data."
            )
🧰 Tools
🪛 Ruff

518-519: Use a single if statement instead of nested if statements

(SIM102)

@gitttt-1234 gitttt-1234 force-pushed the divya/remove-torchdata-inference branch from 0da9ad2 to 80023b3 Compare October 3, 2024 22:05
@gitttt-1234 gitttt-1234 force-pushed the divya/remove-torchdata-inference branch from 7a2cce7 to b89af63 Compare October 3, 2024 22:31
@gitttt-1234 gitttt-1234 changed the base branch from divya/load-backbone-weights to main October 3, 2024 22:31
@gitttt-1234 gitttt-1234 merged commit 2f0fedc into main Oct 3, 2024
7 checks passed
@gitttt-1234 gitttt-1234 deleted the divya/remove-torchdata-inference branch October 3, 2024 22:57
@gitttt-1234 gitttt-1234 restored the divya/remove-torchdata-inference branch October 3, 2024 23:08
@gitttt-1234 gitttt-1234 deleted the divya/remove-torchdata-inference branch October 3, 2024 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants